Exploração inicial dos dados¶

Conjunto de Dados: Vendas de Veículos e Tendências de Mercado¶

O "Vehicle Sales and Market Trends Dataset" oferece uma coleção abrangente de informações relacionadas às transações de venda de diversos veículos.

Detalhes incluídos no conjunto de dados:¶

  • Year: Ano de fabricação do veículo.
  • Make: Marca do veículo (ex: Ford, Toyota, etc.).
  • Model: Modelo do veículo.
  • Trim: Versão ou acabamento específico do modelo.
  • Body Type: Tipo de carroceria (ex: sedan, SUV, hatch).
  • Transmission Type: Tipo de transmissão (manual, automática, etc.).
  • VIN (Vehicle Identification Number): Número de identificação único do veículo.
  • State of Registration: Estado onde o veículo foi registrado.
  • Condition Rating: Classificação da condição do veículo (ex: excelente, bom, regular).
  • Odometer Reading: Quilometragem registrada no hodômetro.
  • Exterior Color: Cor externa do veículo.
  • Interior Color: Cor interna do veículo.
  • Seller Information: Informações sobre o vendedor do veículo.
  • Manheim Market Report (MMR) Values: Valores estimados de mercado segundo o relatório MMR.
  • Selling Prices: Preços reais de venda dos veículos.
  • Sale Dates: Datas em que as vendas ocorreram.

Este conjunto de dados é ideal para análises de mercado automotivo, estudos de depreciação de veículos, avaliação de desempenho de vendas e muito mais.

URL: https://www.kaggle.com/datasets/syedanwarafridi/vehicle-sales-data/data

Load dos dados¶

In [102]:
import kagglehub
from kagglehub import KaggleDatasetAdapter
import os
import pandas as pd
import glob
import shutil
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
from datetime import datetime
import seaborn as sns
import chart_studio.plotly as py
import cufflinks as cf
import plotly.express as px
%matplotlib inline
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
cf.go_offline()
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
warnings.filterwarnings('ignore')
import plotly.graph_objects as go
In [103]:
# pastas
BASE_DIR = os.getcwd()
NOTE_DIR = os.path.dirname(BASE_DIR)
APP_DIR = os.path.dirname(BASE_DIR)
DATA_DIR = os.path.join(APP_DIR, 'data')
DATASET_DIR = os.path.join(DATA_DIR, 'vehicle-sales-data')

# Cria o diretório de dados se não existir
os.makedirs(DATA_DIR, exist_ok=True)
In [104]:
# Baixar a versão mais recente
dataset = kagglehub.dataset_download("syedanwarafridi/vehicle-sales-data")
In [105]:
# Lista os arquivos no conjunto de dados baixado
dataset_files = glob.glob(os.path.join(dataset, "*.csv"))
print("Files in the dataset:")
for file in dataset_files:
    print(f" - {os.path.basename(file)}")
Files in the dataset:
 - car_prices.csv
In [106]:
# Copia os arquivos para o diretório de dados do projeto para acesso mais fácil
os.makedirs(DATASET_DIR, exist_ok=True)
for file in dataset_files:
    dest_file = os.path.join(DATASET_DIR, os.path.basename(file))
    shutil.copy(file, dest_file)
    print(f"Copied {os.path.basename(file)} to {DATASET_DIR}")
Copied car_prices.csv to d:\TOTVS-ATLAS\APP\data\vehicle-sales-data
In [107]:
# Carrega o primeiro arquivo CSV (ajuste se houver vários arquivos)
if dataset_files:
    df = pd.read_csv(dataset_files[0])
    print(f"\nLoaded {os.path.basename(dataset_files[0])}")
    print(f"Shape: {df.shape}")
    
    # Exibe as primeiras linhas
    print("\nPreview of the data:")
    display(df.head())
else:
    print("No CSV files found in the dataset")
Loaded car_prices.csv
Shape: (558837, 16)

Preview of the data:
year make model trim body transmission vin state condition odometer color interior seller mmr sellingprice saledate
0 2015 Kia Sorento LX SUV automatic 5xyktca69fg566472 ca 5.0 16639.0 white black kia motors america inc 20500.0 21500.0 Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
1 2015 Kia Sorento LX SUV automatic 5xyktca69fg561319 ca 5.0 9393.0 white beige kia motors america inc 20800.0 21500.0 Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
2 2014 BMW 3 Series 328i SULEV Sedan automatic wba3c1c51ek116351 ca 45.0 1331.0 gray black financial services remarketing (lease) 31900.0 30000.0 Thu Jan 15 2015 04:30:00 GMT-0800 (PST)
3 2015 Volvo S60 T5 Sedan automatic yv1612tb4f1310987 ca 41.0 14282.0 white black volvo na rep/world omni 27500.0 27750.0 Thu Jan 29 2015 04:30:00 GMT-0800 (PST)
4 2014 BMW 6 Series Gran Coupe 650i Sedan automatic wba6b2c57ed129731 ca 43.0 2641.0 gray black financial services remarketing (lease) 66000.0 67000.0 Thu Dec 18 2014 12:30:00 GMT-0800 (PST)

Analises iniciais (exploração)¶

In [108]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 558837 entries, 0 to 558836
Data columns (total 16 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   year          558837 non-null  int64  
 1   make          548536 non-null  object 
 2   model         548438 non-null  object 
 3   trim          548186 non-null  object 
 4   body          545642 non-null  object 
 5   transmission  493485 non-null  object 
 6   vin           558833 non-null  object 
 7   state         558837 non-null  object 
 8   condition     547017 non-null  float64
 9   odometer      558743 non-null  float64
 10  color         558088 non-null  object 
 11  interior      558088 non-null  object 
 12  seller        558837 non-null  object 
 13  mmr           558799 non-null  float64
 14  sellingprice  558825 non-null  float64
 15  saledate      558825 non-null  object 
dtypes: float64(4), int64(1), object(11)
memory usage: 68.2+ MB
In [109]:
df.apply(lambda x: f"{x.dtype} {x.nunique()} {x.unique()}")
Out[109]:
year            int64 34 [2015 2014 2013 2012 2011 2010 2009 2...
make            object 96 ['Kia' 'BMW' 'Volvo' 'Nissan' 'Chevr...
model           object 973 ['Sorento' '3 Series' 'S60' '6 Seri...
trim            object 1963 ['LX' '328i SULEV' 'T5' ... 'pure'...
body            object 87 ['SUV' 'Sedan' 'Convertible' 'Coupe'...
transmission    object 4 ['automatic' nan 'manual' 'Sedan' 'se...
vin             object 550297 ['5xyktca69fg566472' '5xyktca69f...
state           object 64 ['ca' 'tx' 'pa' 'mn' 'az' 'wi' 'tn' ...
condition       float64 41 [ 5. 45. 41. 43.  1. 34.  2. 42.  3...
odometer        float64 172278 [ 16639.   9393.   1331. ... 20...
color           object 46 ['white' 'gray' 'black' 'red' 'silve...
interior        object 17 ['black' 'beige' 'tan' '—' 'gray' 'b...
seller          object 14263 ['kia motors america  inc' 'finan...
mmr             float64 1101 [ 20500.  20800.  31900. ... 1820...
sellingprice    float64 1887 [ 21500.  30000.  27750. ... 1690...
saledate        object 3766 ['Tue Dec 16 2014 12:30:00 GMT-080...
dtype: object
In [110]:
df.describe()
Out[110]:
year condition odometer mmr sellingprice
count 558837.000000 547017.000000 558743.000000 558799.000000 558825.000000
mean 2010.038927 30.672365 68320.017767 13769.377495 13611.358810
std 3.966864 13.402832 53398.542821 9679.967174 9749.501628
min 1982.000000 1.000000 1.000000 25.000000 1.000000
25% 2007.000000 23.000000 28371.000000 7100.000000 6900.000000
50% 2012.000000 35.000000 52254.000000 12250.000000 12100.000000
75% 2013.000000 42.000000 99109.000000 18300.000000 18200.000000
max 2015.000000 49.000000 999999.000000 182000.000000 230000.000000
In [111]:
df.describe(include='O')
Out[111]:
make model trim body transmission vin state color interior seller saledate
count 548536 548438 548186 545642 493485 558833 558837 558088 558088 558837 558825
unique 96 973 1963 87 4 550297 64 46 17 14263 3766
top Ford Altima Base Sedan automatic automatic fl black black nissan-infiniti lt Tue Feb 10 2015 01:30:00 GMT-0800 (PST)
freq 93554 19349 55817 199437 475915 22 82945 110970 244329 19693 5334
In [112]:
df.isna().sum()
Out[112]:
year                0
make            10301
model           10399
trim            10651
body            13195
transmission    65352
vin                 4
state               0
condition       11820
odometer           94
color             749
interior          749
seller              0
mmr                38
sellingprice       12
saledate           12
dtype: int64
In [113]:
df.nunique()
Out[113]:
year                34
make                96
model              973
trim              1963
body                87
transmission         4
vin             550297
state               64
condition           41
odometer        172278
color               46
interior            17
seller           14263
mmr               1101
sellingprice      1887
saledate          3766
dtype: int64

Limpesa dos dados¶

In [114]:
df.drop(columns=['vin'], inplace=True)
In [115]:
df.dropna(inplace=True)
In [116]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 472325 entries, 0 to 558836
Data columns (total 15 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   year          472325 non-null  int64  
 1   make          472325 non-null  object 
 2   model         472325 non-null  object 
 3   trim          472325 non-null  object 
 4   body          472325 non-null  object 
 5   transmission  472325 non-null  object 
 6   state         472325 non-null  object 
 7   condition     472325 non-null  float64
 8   odometer      472325 non-null  float64
 9   color         472325 non-null  object 
 10  interior      472325 non-null  object 
 11  seller        472325 non-null  object 
 12  mmr           472325 non-null  float64
 13  sellingprice  472325 non-null  float64
 14  saledate      472325 non-null  object 
dtypes: float64(4), int64(1), object(10)
memory usage: 57.7+ MB

Análises Gráficas¶

In [117]:
df.columns.to_list()
Out[117]:
['year',
 'make',
 'model',
 'trim',
 'body',
 'transmission',
 'state',
 'condition',
 'odometer',
 'color',
 'interior',
 'seller',
 'mmr',
 'sellingprice',
 'saledate']
In [118]:
YEAR = df['year'].value_counts()
In [119]:
px.bar(YEAR,title= "Distribuição do ano de produção de todos os carros")
In [120]:
MODEL_LABEL = df['model'].values
In [121]:
MODEL_counts = df['model'].value_counts()
In [122]:
MODEL_count = MODEL_counts[MODEL_counts > 1000]
MODEL_count
Out[122]:
model
Altima              16346
Fusion              12116
F-150               11950
Camry               10986
Escape              10656
                    ...  
HHR                  1040
Avalanche            1023
Frontier             1006
F-350 Super Duty     1005
MKX                  1001
Name: count, Length: 115, dtype: int64
In [123]:
# Cria um gráfico de pizza com tamanho aumentado
fig = go.Figure(data=[go.Pie(labels=MODEL_LABEL, values=MODEL_count)])

# Define o título e aumenta o tamanho
fig.update_layout(
    title_text='Distribuição de Modelos de Carros',
    width=900,                # Aumenta a largura do gráfico
    height=700,               # Aumenta a altura do gráfico
    legend=dict(              # Melhora a legenda
        font=dict(size=10),
        orientation="v"       # Orientação vertical da legenda
    )
)

# Adiciona opções para melhorar a visualização
fig.update_traces(
    textposition='inside',    # Texto dentro das fatias
    textinfo='percent+label', # Mostra percentual e rótulo
    hole=0.3                  # Cria efeito "donut" (opcional)
)

# Exibir o gráfico
fig.show()
In [124]:
px.bar(MODEL_count,title= "Distribuição de ano de produção de todos os carros")
In [125]:
df.groupby("model")["year"].sum()
Out[125]:
model
1 Series      729953
1500         5712217
190-Class       5979
2 Series       64461
200          7478487
              ...   
mdx             2010
tC           1288497
xA             42100
xB            870084
xD            341868
Name: year, Length: 768, dtype: int64
In [126]:
MAKE = df.make.value_counts()
In [127]:
px.bar(MAKE,title= "Distribuição de ano de produção de todos os carros")
In [128]:
MM= pd.crosstab(df.model,df["make"])
In [129]:
px.bar(MM,title= "Distribuição de MM todos os carros")
In [130]:
TRIM = df.trim.value_counts()
In [131]:
TRIM = TRIM[TRIM > 10]
TRIM
Out[131]:
trim
Base                          48779
SE                            39633
LX                            18706
Limited                       15506
LT                            15273
                              ...  
SE S/C                           11
L100                             11
Autobiography                    11
R320 CDI                         11
L 35th Anniversary Edition       11
Name: count, Length: 966, dtype: int64
In [132]:
px.bar(TRIM,title= "Distribuição de TRIM de todos os carros", color_discrete_sequence=["red"])
In [133]:
TTM = pd.crosstab(df.trim,df["make"])
In [134]:
px.line(TTM,title= "Distribuição de TTM de todos os carros")
In [135]:
BODY = df['body'].value_counts()
In [136]:
px.bar(BODY,title= " Distribuição de BODY de todos os carros")
In [137]:
df.columns.to_list()
Out[137]:
['year',
 'make',
 'model',
 'trim',
 'body',
 'transmission',
 'state',
 'condition',
 'odometer',
 'color',
 'interior',
 'seller',
 'mmr',
 'sellingprice',
 'saledate']
In [138]:
transmission_LABEL = df['transmission'].values
transmission_LABEL
Out[138]:
array(['automatic', 'automatic', 'automatic', ..., 'automatic',
       'automatic', 'automatic'], shape=(472325,), dtype=object)
In [139]:
transmission_counts= df['transmission'].value_counts()
transmission_counts
Out[139]:
transmission
automatic    455963
manual        16362
Name: count, dtype: int64
In [140]:
px.bar(transmission_counts,title= "Distribution of BODY all cars,")
In [141]:
pd.crosstab(df["trim"],df["transmission"])
Out[141]:
transmission automatic manual
trim
! 361 38
(1999.5) XE 4 1
+ 1389 22
1 34 10
1.6 18 6
... ... ...
xDrive35d 231 0
xDrive35i 523 0
xDrive35i Premium 244 0
xDrive48i 48 0
xDrive50i 244 0

1494 rows × 2 columns

In [142]:
YT = pd.crosstab(df["year"],df["transmission"])
In [143]:
px.line(YT,title= "Distribution of SY all cars")
In [144]:
plt.figure(figsize=(8, 6))
transmission_counts.plot(kind='pie')
plt.title('A Number of transmission counts ')
plt.show()
No description has been provided for this image
In [145]:
STATE = df['state'].value_counts()
In [146]:
STATE
Out[146]:
state
fl    75243
ca    66213
tx    41657
ga    30939
pa    24117
nj    23003
il    21494
oh    20294
tn    19183
nc    18731
mo    15286
mi    14185
nv    11682
md     9788
va     9448
wi     9196
mn     8627
az     7612
wa     7012
co     6270
ma     6167
ny     4749
in     3879
sc     3754
ne     3685
pr     2427
la     1744
ms     1730
ut     1716
hi     1205
or     1047
nm      163
ok       54
al       25
Name: count, dtype: int64
In [147]:
ST = pd.crosstab(df["state"],df["transmission"])
In [148]:
px.line(ST,title= "Distribution of ST all cars")
In [149]:
SM = pd.crosstab(df["state"],df["make"]).sum()
In [150]:
px.bar(SM,title= "Distribution of SM all cars")
In [151]:
SB = pd.crosstab(df["state"],df["body"])
In [152]:
px.line(SB,title= "Distribution of SB all cars")
In [153]:
SY = pd.crosstab(df["state"],df["year"]).sum()
In [154]:
px.bar(SY,title= "Distribution of SY all cars")
In [155]:
df.columns.to_list()
Out[155]:
['year',
 'make',
 'model',
 'trim',
 'body',
 'transmission',
 'state',
 'condition',
 'odometer',
 'color',
 'interior',
 'seller',
 'mmr',
 'sellingprice',
 'saledate']
In [156]:
Condition= df['condition'].value_counts()
In [157]:
px.bar(Condition,title= "Distribution of SY all cars")
In [158]:
CY = pd.crosstab(df["condition"],df["year"]).sum()
In [159]:
px.line(CY,title= "Distribution of CY all cars")
In [ ]:
 
In [160]:
# Cria um gráfico de pizza
fig = go.Figure(data=[go.Pie(labels=df["year"], values=df["condition"])])
# Define o título
fig.update_layout(title_text='Gráfico de Pizza Exemplo')

# Exibir o gráfico
fig.show()
In [161]:
px.bar(CY,title= "Distribuição de SY de todos os carros")
In [162]:
CMM = pd.crosstab(df["condition"],df["make"])
In [163]:
px.line(CMM,title= "Distribuição de CMM de todos os carros")
In [164]:
px.bar(CMM,title= "Distribuição de CMM de todos os carros")
In [165]:
CM = pd.crosstab(df["condition"],df["model"]).sum()
In [166]:
px.line(CM,title= "Distribuição de CM de todos os carros")
In [167]:
CT = pd.crosstab(df["condition"],df["transmission"])
In [168]:
px.line(CT,title= "Distribuição de CT de todos os carros")
In [169]:
CS = pd.crosstab(df["condition"],df["state"])
In [170]:
CS
Out[170]:
state al az ca co fl ga hi il in la ... or pa pr sc tn tx ut va wa wi
condition
1.0 1 101 1294 24 433 138 4 133 110 71 ... 33 153 29 126 119 439 141 252 76 312
2.0 0 261 2851 126 2538 1184 25 680 263 62 ... 100 747 91 164 337 1411 119 625 301 399
3.0 0 131 1472 212 1554 353 43 1127 81 30 ... 20 811 77 134 144 674 29 158 125 108
4.0 0 197 1756 295 4325 616 45 2082 66 38 ... 28 1483 52 114 481 988 16 347 165 167
5.0 0 43 837 177 1386 553 15 622 47 18 ... 12 427 2 52 786 692 2 165 185 180
11.0 0 0 14 1 8 5 1 2 0 1 ... 0 3 0 0 2 18 1 2 2 3
12.0 0 0 19 2 7 5 0 2 1 1 ... 1 2 0 0 4 14 1 1 2 5
13.0 0 2 8 0 10 3 0 3 1 1 ... 1 3 0 0 0 18 0 3 0 2
14.0 0 1 15 1 20 5 1 6 0 1 ... 0 6 1 0 6 20 0 1 1 3
15.0 0 4 13 1 11 10 1 2 0 1 ... 1 1 0 2 5 27 1 4 3 6
16.0 0 5 17 1 15 8 2 1 2 2 ... 0 3 1 0 5 29 5 4 1 5
17.0 0 3 40 0 27 7 0 8 3 0 ... 0 2 3 4 6 40 2 5 2 3
18.0 0 5 47 1 25 17 4 6 5 1 ... 1 8 1 1 6 61 2 8 4 9
19.0 5 1041 4099 479 6791 2819 94 1508 238 194 ... 49 1076 119 250 888 3110 256 1400 281 916
21.0 1 176 1088 87 1289 588 23 304 60 36 ... 15 268 38 66 195 746 48 254 72 185
22.0 1 105 685 53 861 360 14 175 43 20 ... 13 206 23 49 128 486 27 184 59 124
23.0 1 127 945 74 1209 513 16 266 57 33 ... 21 285 41 48 175 664 36 219 92 143
24.0 1 167 1136 106 1249 609 24 323 55 37 ... 22 315 45 58 206 791 50 226 92 181
25.0 1 215 1527 135 1675 782 33 461 76 32 ... 17 447 66 115 290 1028 64 339 127 230
26.0 1 192 1553 107 1507 783 28 437 71 43 ... 18 411 57 73 259 1028 63 327 117 220
27.0 1 265 2000 205 1992 1063 35 528 106 61 ... 32 604 82 151 420 1357 82 449 182 276
28.0 0 306 2424 184 2188 1282 46 609 131 61 ... 33 750 93 160 495 1641 92 478 217 363
29.0 1 333 2447 193 2199 1165 50 668 119 63 ... 32 742 80 139 464 1634 86 475 217 357
31.0 1 155 1113 79 1093 603 14 299 60 26 ... 17 464 42 85 244 804 31 186 107 157
32.0 0 175 1312 99 1238 604 20 311 59 37 ... 20 511 75 77 281 811 29 213 106 172
33.0 2 189 1414 110 1322 648 21 377 58 31 ... 17 457 63 81 299 841 32 229 140 171
34.0 0 301 2233 168 2178 990 36 600 120 73 ... 41 812 104 148 504 1430 55 302 181 318
35.0 1 458 3514 247 3274 1539 68 919 166 115 ... 50 1236 170 198 855 2139 79 451 304 421
36.0 3 378 3041 269 2924 1292 38 710 156 104 ... 53 1113 157 184 766 1920 78 336 294 393
37.0 0 455 3435 308 3288 1439 54 779 157 96 ... 64 1305 147 188 877 2241 88 332 358 373
38.0 1 340 2647 223 2426 1007 43 574 144 67 ... 26 871 100 131 662 1558 50 220 300 284
39.0 0 279 2699 236 2562 1042 61 641 156 70 ... 55 885 110 123 771 1719 35 213 310 316
41.0 1 285 3113 257 3191 1220 50 676 151 59 ... 50 1233 102 144 925 1749 27 211 334 328
42.0 0 272 3105 312 3514 1266 56 823 182 66 ... 51 1245 102 141 997 1858 19 193 357 370
43.0 1 222 3190 292 3702 1363 58 852 213 45 ... 40 1179 93 133 1170 1897 21 201 380 361
44.0 1 171 2976 361 3796 1453 68 967 204 47 ... 36 1217 95 126 1423 1811 14 159 455 383
45.0 0 68 1375 154 1996 659 27 476 111 18 ... 13 556 44 63 707 870 9 55 211 179
46.0 0 63 1373 169 1892 717 32 668 109 16 ... 13 576 44 60 767 790 7 66 227 203
47.0 0 49 1132 143 1816 687 21 496 80 21 ... 15 555 28 59 729 698 6 47 181 175
48.0 0 34 1140 165 1889 799 15 778 151 17 ... 17 570 46 60 855 806 8 41 208 206
49.0 0 38 1114 214 1823 743 19 595 67 29 ... 20 579 4 47 930 799 5 67 236 189

41 rows × 34 columns

In [171]:
px.bar(CS,title= "Distribuição de Cs de todos os carros")
In [172]:
df.columns.to_list()
Out[172]:
['year',
 'make',
 'model',
 'trim',
 'body',
 'transmission',
 'state',
 'condition',
 'odometer',
 'color',
 'interior',
 'seller',
 'mmr',
 'sellingprice',
 'saledate']
In [173]:
Odometer = df['odometer'].value_counts()
In [174]:
CS = pd.crosstab(df["condition"],df["odometer"]).sum()
In [175]:
CS
Out[175]:
odometer
1.0         979
2.0          10
3.0           5
4.0           7
5.0          13
           ... 
621388.0      1
694978.0      2
959276.0      1
980113.0      1
999999.0     60
Length: 160427, dtype: int64
In [176]:
px.line(CS,title= "Distribuição de CS de todos os carros")
In [177]:
COLOR = df['color'].value_counts()
In [178]:
COLOR
Out[178]:
color
black        93244
white        89233
silver       71251
gray         70640
blue         43060
red          37336
—            22068
gold          9563
green         9234
burgundy      7605
beige         7444
brown         5621
orange        1763
purple        1270
off-white     1244
yellow        1079
charcoal       429
turquoise      190
pink            39
lime            12
Name: count, dtype: int64
In [179]:
px.bar(COLOR,title= "Distribuição de Cor Exterior de todos os carros")
In [180]:
MC = pd.crosstab(df['make'],df["color"])
In [181]:
MC
Out[181]:
color beige black blue brown burgundy charcoal gold gray green lime off-white orange pink purple red silver turquoise white yellow —
make
Acura 33 1253 281 15 106 4 94 967 72 0 28 2 0 8 148 726 2 711 0 53
Aston Martin 0 11 2 0 0 1 0 5 1 0 0 0 0 0 0 2 0 1 0 1
Audi 36 1490 445 35 2 4 36 742 23 0 2 0 0 4 143 703 1 915 3 218
BMW 152 5423 1734 211 55 6 134 3380 205 0 3 13 1 10 506 1825 5 3389 1 456
Bentley 0 32 22 2 0 1 0 19 2 0 0 0 0 0 2 13 0 10 0 2
Buick 270 546 292 196 122 3 286 493 79 0 39 0 0 22 256 887 2 910 0 67
Cadillac 306 1951 372 89 82 9 306 482 66 0 141 0 14 27 356 912 3 1135 0 64
Chevrolet 1479 8977 4865 579 810 66 1506 6819 696 1 52 264 12 138 4832 9774 22 12073 278 907
Chrysler 301 2649 1971 104 315 31 581 1995 200 0 92 7 0 49 1390 2511 11 2630 8 288
Daewoo 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Dodge 297 4855 3273 93 271 71 697 3899 193 1 28 383 1 46 3099 4170 7 5161 85 551
FIAT 8 130 42 16 0 9 0 98 67 0 1 12 0 0 112 64 0 130 21 15
Ferrari 0 2 0 0 0 0 0 3 0 0 0 0 0 0 6 1 0 1 2 2
Fisker 0 3 0 0 1 0 0 2 0 0 0 0 0 0 0 2 0 1 0 0
Ford 608 13416 6311 1041 1313 37 792 9528 1871 3 190 237 4 100 8194 9354 36 18262 131 9585
GMC 264 2032 650 130 181 18 221 1044 115 0 35 6 0 28 598 1167 6 2608 6 108
Geo 0 0 3 0 0 0 2 0 3 0 0 1 0 0 5 0 0 2 0 0
HUMMER 22 215 89 1 17 0 14 86 22 0 0 25 0 0 69 49 1 104 41 13
Honda 340 4838 2861 410 748 9 611 5500 663 0 20 59 0 71 1074 4324 12 2935 6 300
Hyundai 251 3156 2775 374 252 9 305 2814 221 1 23 64 0 34 1776 3561 6 2678 29 330
Infiniti 53 3885 1098 54 76 1 85 3852 75 0 35 6 0 58 229 1073 1 2600 0 830
Isuzu 8 30 14 1 5 0 3 17 20 0 0 1 0 0 20 32 0 27 0 1
Jaguar 25 325 158 3 22 1 50 153 93 0 0 0 0 3 86 177 0 142 0 19
Jeep 217 2903 1113 123 278 20 232 1506 591 2 5 185 0 9 1243 2057 3 1907 94 247
Kia 216 2846 1051 575 588 3 206 2319 517 1 49 77 0 107 1487 2691 2 2762 79 252
Lamborghini 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0
Land Rover 7 498 60 24 5 1 16 173 32 0 1 5 0 3 39 111 1 321 4 21
Lexus 181 2602 689 138 132 47 412 1534 137 0 110 1 1 20 501 1732 1 1795 5 371
Lincoln 137 1162 209 85 113 1 95 389 57 0 92 3 0 12 365 531 0 838 0 837
Lotus 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
MINI 27 242 608 80 9 2 23 302 202 0 48 63 0 10 464 192 1 565 52 56
Maserati 0 35 11 3 6 1 0 14 0 0 1 0 0 0 1 19 0 17 0 1
Mazda 49 1422 718 21 85 10 46 1755 231 0 4 15 0 33 653 876 13 895 7 97
Mercedes-Benz 126 4706 635 28 84 9 227 1588 70 0 6 1 0 17 402 2203 0 2572 3 1331
Mercury 101 225 151 16 47 1 107 155 104 0 7 1 0 1 121 275 0 227 0 32
Mitsubishi 96 632 342 33 45 0 27 532 76 0 7 70 4 6 350 627 2 716 3 93
Nissan 680 9282 3336 542 934 11 556 7151 374 0 123 53 1 306 2928 6417 9 8562 43 2735
Oldsmobile 24 23 19 11 9 0 30 19 25 0 0 0 0 1 32 62 1 61 0 1
Plymouth 0 5 0 0 1 0 0 1 0 0 0 1 0 1 2 2 0 2 1 0
Pontiac 57 621 447 31 115 2 118 515 128 0 1 17 0 13 527 735 8 626 13 44
Porsche 4 348 115 26 15 2 12 160 16 0 1 4 0 2 64 180 0 186 15 7
Ram 5 797 247 80 62 3 26 389 36 0 0 25 0 4 506 484 0 1312 1 67
Rolls-Royce 0 4 1 0 0 0 0 6 0 0 0 0 0 0 0 0 0 4 0 1
Saab 10 118 49 8 6 0 1 74 12 0 2 0 0 0 18 86 0 27 0 5
Saturn 71 414 394 34 59 1 118 279 133 1 0 19 0 16 243 495 2 270 6 15
Scion 4 277 211 16 61 2 3 259 19 0 6 39 0 61 80 197 0 263 8 39
Subaru 69 432 869 40 90 4 73 707 278 0 13 24 0 6 327 661 3 689 2 82
Suzuki 51 153 121 7 33 0 50 99 10 0 6 12 0 1 114 187 0 129 0 4
Tesla 0 5 1 0 0 0 0 8 1 0 0 0 0 0 2 2 0 3 0 1
Toyota 700 5351 2977 254 329 25 1319 6225 1229 0 42 41 1 34 3157 6807 27 5877 22 896
Volkswagen 103 2146 1052 67 95 3 88 2117 176 2 29 19 0 9 629 1583 1 1606 97 958
Volvo 55 707 337 24 26 1 55 445 84 0 2 4 0 0 130 654 1 500 1 56
smart 1 68 38 1 0 0 0 21 9 0 0 4 0 0 49 57 0 76 10 9
In [182]:
px.bar(MC,title= "Distribuição de Cor Exterior de todos os carros")
In [183]:
Interior = df['interior'].value_counts()
In [184]:
Interior
Out[184]:
interior
black        209865
gray         152616
beige         50469
tan           37723
—              9864
brown          7102
red            1128
silver         1002
blue            917
off-white       355
gold            288
purple          287
white           212
green           200
burgundy        154
orange          124
yellow           19
Name: count, dtype: int64
In [185]:
px.bar(Interior,title= "Distribuição de Cor Interior de todos os carros")
In [186]:
Seller = df['seller'].value_counts()
In [187]:
Seller
Out[187]:
seller
ford motor credit company llc      17756
the hertz corporation              16286
nissan-infiniti lt                 15777
santander consumer                 14245
avis corporation                   11471
                                   ...  
nu image auto llc                      1
derek scott's auto park                1
maserati north america inc             1
alternative financial group inc        1
lienemann auto sales inc               1
Name: count, Length: 11923, dtype: int64
In [188]:
Seller = Seller[Seller > 1000]
Seller
Out[188]:
seller
ford motor credit company llc       17756
the hertz corporation               16286
nissan-infiniti lt                  15777
santander consumer                  14245
avis corporation                    11471
                                    ...  
td auto finance                      1137
kevins marysville auto sales inc     1099
ars/enterprise                       1084
oriental bank                        1044
pv holding inc/gdp                   1032
Name: count, Length: 62, dtype: int64
In [189]:
px.line(Seller,title= "Distribuição das entidades vendedoras dos veículos (todos os carros)")
In [190]:
SMM = pd.crosstab(df['seller'],df["make"]).sum()
In [191]:
SMM
Out[191]:
make
Acura             4503
Aston Martin        24
Audi              4802
BMW              17509
Bentley            105
Buick             4470
Cadillac          6315
Chevrolet        54150
Chrysler         15133
Daewoo               2
Dodge            27181
FIAT               725
Ferrari             17
Fisker               9
Ford             81013
GMC               9217
Geo                 16
HUMMER             768
Honda            24781
Hyundai          18659
Infiniti         14011
Isuzu              179
Jaguar            1257
Jeep             12735
Kia              15828
Lamborghini          3
Land Rover        1322
Lexus            10409
Lincoln           4926
Lotus                1
MINI              2946
Maserati           109
Mazda             6930
Mercedes-Benz    14008
Mercury           1571
Mitsubishi        3661
Nissan           44043
Oldsmobile         318
Plymouth            16
Pontiac           4018
Porsche           1157
Ram               4044
Rolls-Royce         16
Saab               416
Saturn            2570
Scion             1545
Subaru            4369
Suzuki             977
Tesla               23
Toyota           35313
Volkswagen       10780
Volvo             3082
smart              343
dtype: int64
In [192]:
px.bar(SMM,title= "Distribuição das entidades responsáveis pela venda dos veículos em todos os carros")
In [193]:
df.columns.to_list()
Out[193]:
['year',
 'make',
 'model',
 'trim',
 'body',
 'transmission',
 'state',
 'condition',
 'odometer',
 'color',
 'interior',
 'seller',
 'mmr',
 'sellingprice',
 'saledate']
In [194]:
df['profit_or_loss'] = df['mmr'] - df['sellingprice']
In [195]:
df['profit_or_loss'].describe()
Out[195]:
count    472325.000000
mean        146.546905
std        1741.175011
min     -207200.000000
25%        -650.000000
50%          50.000000
75%         800.000000
max       87750.000000
Name: profit_or_loss, dtype: float64
In [196]:
PL = df['profit_or_loss'].value_counts()
In [197]:
px.line(PL,title= "Distribuição dos lucros ou prejuízos de todos os carros")